Association of surgeons’ gender with elective surgical lists in the State of Florida is explained by differences in mean operative caseloads

Background A recent publication reported that at three hospitals within one academic health system, female surgeons received less surgical block time than male surgeons, suggesting potential gender-based bias in operating room scheduling. We examined this observation’s generalizability. Methods Our cross-sectional retrospective cohort study of State of Florida administrative data included all 4,176,551 ambulatory procedural encounters and inpatient elective surgical cases performed January 2017 through December 2019 by 8875 surgeons (1830 female) at all 609 non-federal hospitals and ambulatory surgery centers. There were 1,509,190 lists of cases (i.e., combinations of the same surgeon, facility, and date). Logistic regression adjusted for covariables of decile of surgeon’s quarterly cases, surgeon’s specialty, quarter, and facility. Results Selecting randomly a male and a female surgeons’ quarter, for 66% of selections, the male surgeon performed more cases (P < .0001). Without adjustment for quarterly caseloads, lists comprised one case for 44.2% of male and 54.6% of female surgeons (difference 10.4%, P < .0001). A similar result held for lists with one or two cases (difference 9.1%, P < .0001). However, incorporating quarterly operative caseloads, the direction of the observed difference between male and female surgeons was reversed both for case lists with one (-2.1%, P = .03) or one or two cases (-1.8%, P = .05). Conclusions Our results confirm the aforementioned single university health system results but show that the differences between male and female surgeons in their lists were not due to systematic bias in operating room scheduling (e.g., completing three brief elective cases in a week on three different workdays) but in their total case numbers. The finding that surgeons performing lists comprising a single case were more often female than male provides a previously unrecognized reason why operating room managers should help facilitate the workload of surgeons performing only one case on operative (anesthesia) workdays.

found no surgical specialty to have average case durations sufficiently long that single cases would usually fill an operating room for the workday [23][24][25][26]. Hypothesis #1 is important because although these low-caseload surgeons account for most surgical growth from year to year [10,25,26], their individual percentage utilization of operating room time, either adjusted (including turnover times) or raw (without turnover times) [27], cannot be measured accurately [23,28]. Such limitations associated with measuring workload of low-caseload surgeons is an example of the type of knowledge necessary for teams responsible for operating room time to make the best possible decisions [11,[13][14][15].
If there were differences in the percentages of lists with one case between female and male surgeons, there would be different work experiences of female and male surgeons in surgical suites [7,21,29]. Understanding the causes of gender-related differences in case scheduling would give insight to operating room managers on how to address potential gender-based concerns related to inequality of access to operating room time. For example, the raw differences between genders might be an artifact from including surgeons' specialty as a confounder, but not considering surgeons' quarterly operative caseloads [22]. Our hypothesis #2 was that gender differences would remain despite incorporating confounders, most specifically, quarterly caseloads. If hypothesis #2 were rejected, an implication would be that single-case lists are not related to gender bias, but rather to other factors, such as smaller quarterly caseloads.

Materials and methods
The Institutional Review Board of the University of Florida (IRB202002442) approved this research as exempt from patient consent. The Institutional Review Boards of the University of Iowa (October 20, 2021) and the University of Miami (October 21, 2021) determined that the current analyses of de-identified data do not meet the regulatory definition of human subjects research.

Cross-sectional retrospective cohort study using State of Florida data
We used publicly available data from the Agency for Health Care Administration (AHCA) for patients receiving care at non-federal hospitals and ambulatory surgery centers in Florida from January 1, 2017, through December 31, 2019 [22]. Following approval by AHCA, these data were supplemented with the date of each encounter (for ambulatory patients) and the date of each hospital admission (for inpatients). Dates were necessary to obtain our primary endpoint, the percentage of daily lists of cases (i.e., combinations of a surgeon, facility, and date of surgery) per quarter that included one case. Such discharge abstract data do not include case durations or surgical times, but even if they did, percentage utilizations could not be estimated accurately for these surgeons [23,28]. Data use agreements were executed between the University of Florida and AHCA, and between the University of Florida and the University of Miami. AHCA disclaims responsibility for the results and conclusions of the study. Sharing of these data is precluded by those data use agreements; readers interested in analyzing the raw data will need to apply for their use with the AHCA as done by the authors. Table 1 shows the exclusion criteria among the 10,589,761 outpatient encounters and inpatient elective surgical admissions in Florida included in this study [22]. The resulting 4,176,551 cases were performed during the 12 included quarters by 8875 surgeons, with female (21%) or male gender determinable for all surgeons from the National Provider Identifier database (Table 2) [30]. Throughout our paper, we use the terms "male" and "female" because nonbinary options for gender are not provided in the study database.

Statistical analyses
Full statistical output, including commands and comments, is provided in the supplemental content. These are listed in the same sequence of the Methods and Results for readers interested in more detail or who want to replicate our work with other state or provincial datasets. Stata v17.0 was used for all analyses (StataCorp, College Station, Texas).
Univariate analyses were performed to compare surgeons based on gender. Exact binomial confidence intervals for the area under the receiver operating characteristic curves (c-statistic) of each surgeon's quarterly operative caseload to predict gender [31,32] were calculated. The 99% confidence intervals for the proportion of lists that comprised one case were calculated using the delta method.
Hypothesis #1 assessed the unadjusted association between surgeon gender and whether the list contained one case. Logistic regression was used, P < .01 treated as significant.
Hypothesis #2 assessed adjusted analyses. Logistic regression was used to control for deciles of cases per quarter, specialty, quarter, and interactions between gender and deciles (Tables 2  and 3). We applied clustering by facility when calculating the standard errors. The supplemental content includes rationales and details related to the calculations, and the associated regression coefficients from the logistic regressions. P < .01 was treated as significant.
Both hypotheses were formulated in terms of differences of proportions. (Although odds ratios were calculated, they grossly overstate relative risks because the prevalence of surgeons having lists of one case, or one or two cases, was approximately 50%, not small [e.g., 5%] or large [e.g., 95%] (Table 3). Therefore, we report the odds ratios only in supplemental content.) Contrasts of predictive probabilities between genders were calculated using the STATA margins command. Numerical variables were handled in the regression analyses by using their average estimates over all observations. Confidence intervals were calculated using the delta method to apply the regression model's variance estimates. These contrasts are reported in the paper because they combine the coefficients of gender alone and the interactions in the nonlinear (logistic) regression. Several sensitivity analyses were performed. First, individual independent variables (e.g., cases per quarter) were removed from the models to understand what adjustments were influencing results. Second, the dependent variable was changed from one case per list to one or two cases per list (Table 3). Third, we examined sensitivity of confidence interval width (i.e., statistical power) to the numbers of lists (i.e., hypothetical benefit of adding years of data) versus the numbers of facilities (i.e., would need to add more US state[s]). Regarding study size, in addition to the preceding sensitivity analysis, we also evaluated our decision to proceed (i.e., test our hypothesis #1) after reading the report by Yesantharao from three hospitals [21]. To judge minimum differences that are managerially important, the University of Iowa Department of Anesthesia routinely holds meetings addressing requirements to provide anesthesia services for procedural based physicians who will do 1 or 2 cases in a procedural suite every two weeks (e.g., pain medicine physicians) [33]. Because such proceduralists generally can fill an operating room for half-day, managers regularly treat issues related to 1 out of 20 lists as appropriately calling for their attention [33], where 20 = (10 days in 2 weeks) × (2 of these brief lists per workday per anesthesia practitioner). Being conservative, statistically, if the 99% confidence interval (P < .01) for the unadjusted absolute difference between female and male surgeons excluded 5% (i.e., 1 out of 20 lists), then the study size would be sufficiently large to detect a managerially important difference at the University of Iowa. Such considerations underestimate value because there also are organizational and societal consequences for policies being biased (e.g., based on gender).

Results
We analyzed 1,509,190 lists of cases performed by 8875 surgeons (Tables 1 and 2).
While male surgeons performed a median of 29 cases per quarter (interquartile range 12 to 61), female surgeons performed a median of 13 cases per quarter (interquartile range 7 to 30) ( Table 3). Male surgeons' mode was the 10 th decile, with a left-skewed distribution, while female surgeons' mode was the 2 nd decile, with a right-skewed distribution. Suppose that a male and a female surgeons' quarter were selected randomly. Then, for 66% of such selections (i.e., the c-statistic), the male surgeon would have performed more cases (99% confidence interval 65% to 66%, P < .0001).
While 44.2% of male surgeons' lists of cases (i.e., surgeon-date-facility combination) had one case, 54.6% of female surgeons' lists had one case (Tables 3 and 4, P < .0001; Fig 1, top row). The value of 54.6% was significantly greater than half (P < .0001). While 65.8% of male surgeons' lists had one or two cases, 74.8% of female surgeons' lists had one or two cases (Tables 3 and 4, P < .0001).

Hypothesis 1: Female surgeons have a greater fraction of their lists with only one case
More lists of female surgeons than male surgeons comprised one case (10.4%, P < .0001; Fig 1  top row). Therefore, hypothesis #1 was not rejected. The same result held for lists with one or two cases (9.1%, P < .0001). Both two-sided lower 99% confidence limits exceeded 5.0%, showing managerially important effect sizes.

Hypothesis 2: Gender differences in fractions of lists with only one case persist after incorporating confounders
After adjustment for covariates (e.g., quarterly operative caseloads and specialty), there were no significant differences between female and male surgeons in the absolute percentage differences of lists of cases that comprised one case (-2.1%, P = .03; Fig 1). Therefore, hypothesis #2 was rejected. The same result held for lists with one or two cases (-1.8%, P = .05) ( Table 4 and Fig 2). The reversals of signs of the estimates from differences suggesting less access of female versus male surgeons (10.4% and 9.1%) to the opposite (-2.1% and -1.8%) were attributable to the differences in quarterly operative caseloads between female and male surgeons (Table 4 and Figs 1 and 2).

Discussion
Operating room managers have a responsibility to ensure that surgeons and other proceduralists have access to operating room time that is based fairly on the medical requirements of their patients and their quarterly caseloads (workloads). Neither incentive programs nor surgical scheduling should be biased, either directly or indirectly, with respect to protected characteristics (e.g., gender, age) of employees and medical staff [22,34]. The operating room manager has at most negligible influence over a surgeon's workload during a quarter but does affect how and when those surgical cases are performed. Our managerial epidemiology study shows the generalizability of Yesantharao et al.'s earlier study of three hospitals to the hundreds of hospitals and ambulatory surgery centers throughout a state [21]. Just like they found, our results (hypothesis #1) showed that surgeon gender was associated with differences in surgical case scheduling [21]. Because of significant differences in quarterly caseloads between male and female surgeons [22], hospitals should expect female surgeons to have substantively (>5%) greater incidences of elective surgical days with only one case, or with one or two cases, than male surgeons. Our results (hypothesis #2) also show that such differences between male and female surgeons in their lists (Fig 1 top row) were not due to systematic bias in operating room scheduling (Fig 1 next ten rows). By this, we mean that when total quarterly workload averages three cases per week, whether that would be one day with three cases or three days each with one case does not significantly differ based on surgeon gender. The frequency of surgical lists including one case was explained principally by the surgeons' quarterly caseloads, a novel finding because examined among thousands of surgeons.

Addressing needs of low caseload surgeons (i.e., the majority of female surgeons)
Managers need to recognize that female versus male surgeons' experiences with case scheduling may differ substantively, because surgeons with single cases (i.e., more often female surgeons)  Fig 1. b Deciles are given in Tables 2 and 3. https://doi.org/10.1371/journal.pone.0283033.t004 less often have first case starts and therefore have reduced personal productivity from greater tardiness of waiting for preceding late running cases to finish (Fig 1) [7,21,29]. There are essentially no conditions wherein these surgeons performing single cases should have been allocated individual block time, because they could not fully fill the workday and partial use of a day cannot be With adjustment for quarterly operative caseloads, the gender association (male < surgeon) is mitigated fully, because male surgeons' mode was the 10 th decile while female surgeons' mode was the 2 nd decile (i.e., they performed fewer cases). Put another way, select at random a male surgeon-quarter and a female surgeon-quarter, both in the same decile of quarterly caseload. The paired surgeons perform surgery on average for the same numbers of workdays to complete those cases. That is what the operating room manager controls and that influences surgeon productivity. For inferential results controlling for specialty see Table 4 and Fig 2. The potential interaction included in the statistical model is shown by the medians of male and female surgeons matching for the lower (e.g., 1 st and 2 nd ) deciles but not for the upper (e.g., 9 th and 10 th ) deciles.
https://doi.org/10.1371/journal.pone.0283033.g001 estimated sufficiently accurately [23,28]. Nevertheless, there is much that operating room managers can do to facilitate case scheduling for these low caseload surgeons, often performing a single "to-follow" case [7,29]. Such interventions may be country dependent, with prior studies being principally from the USA. Managers can ensure there are processes such that those surgeons with one case to be performed can get their case on the operating room schedule into their service's shared (allocated) time several weeks in advance when the case is without resource constraints (e.g., only can be performed in a specific room). That can be done probabilistically, not assigning the case to a specific room, but rather providing a confirmed date with an expectation that there will be sufficient capacity for the case, but not with a specified start time [33,35,36]. If the service's time fills, and the low-caseload surgeon has a case to schedule, the manager can release the time of another service forecasted to have substantial unused time [33,[37][38][39]. When feasible, plan a brief gap (e.g., 30 min) in addition to the average turnover time between the estimated end time of the preceding surgeon in the operating theatre and the start time of the surgeon with the single case [40][41][42]. Adding a brief gap is especially useful if the preceding cases have a substantial probability of finishing at least 1-hour later than scheduled [42][43][44]. Such a process reduces the amount of time that a surgeon would be in the facility waiting idly for an operating room to be available. Preceding surgical cases finishing late and thereby reducing the surgeon's productivity happens far more often than delays in the availability of surgeons due to travel disruptions when coming from clinics or from other hospitals [45].

Limitations
Our finding of absence of gender-based bias in how operating room cases were scheduled does not negate the fundamentally different question as to why female surgeons performed far Error bars show two-sided 99% confidence intervals for the differences in these expected values. The Stata commands and output of these logistic regression models are in the supplemental content. The unadjusted models used gender alone as the independent variable, with robust variance estimation to be consistent with the other models. The full adjusted model included gender, decile of surgeon's cases during the quarter, interaction between gender and caseload, surgeon's specialty, and quarter (Tables 2 and 3), and was estimated with clustering by facility. The model with adjustment for quarterly operative caseload only excluded the interaction but was estimated with clustering by facility (Table 4). When estimated without clustering the point estimate was the same but the confidence intervals are narrower, ± 0.2% (Table 4). All variables other than gender were handled by using their average estimates over all observations. The figure shows that unadjusted differences between male and female surgeons vanishes when accounting for the effect of the confounding variables, most importantly caseload. See supplemental content for these averages. https://doi.org/10.1371/journal.pone.0283033.g002 fewer cases per quarter. For example, throughout Ontario, female surgeons received fewer procedural referrals than male surgeons, principally because male physicians more often referred patients to male surgeons than female surgeons [46]. That we cannot address this different question is mitigated by the fact that factors influencing surgeon referrals rarely are under the purview of operating room managers (i.e., our study addresses systematic bias in operating room scheduling, not systemic gender-based bias such as due to referral patterns) [22]. In addition to surgeons operating in the state of Florida [9,26], the observation of overall one or two elective cases per week (Table 2) has similarly been found in multiple previous managerial epidemiology studies from the USA, including three University hospitals [47][48][49], a large community hospital [50], a city wide health system [29], statewide in Iowa [8,25], and statewide in Florida among pain medicine physicians' operative cases [33]. Although suggesting generalizability and the importance of studying surgeons performing single cases on their operative days, our results show that findings may differ among countries depending on surgeons' average caseloads. Because we studied major therapeutic and major diagnostic procedures (i.e., cases requiring general anesthesia or major conduction blocks) ( Table 1), results may differ for countries with surgeons performing proportionately less time evaluating patients, performing office-based procedures, and/or performing minor therapeutic and diagnostic procedures.
We used two years of data (ending just before the start of the COVID-19 pandemic) to achieve a contemporaneous cross-sectional analysis. Had we increased the months of data (or even, potentially, the numbers of surgeons), neither would have substantively reduced our confidence interval widths because these were limited, principally, by the numbers of facilities (Table 4 and Fig 1). In Florida, the number of hospitals does not change markedly from year to year. We studied all non-federal hospitals and all ambulatory surgery centers statewide in Florida. Therefore, greater precision would be obtained by adding more state(s). By doing so, it might be possible to determine if the marginal P-values suggesting possible bias favoring female surgeons is a reliable finding. However, we doubt that such knowledge would be of substantive value because, in retrospect, the practical use would be for the above guidance to the individual operating room manager at single facilities, not to inform statewide or national policy. Furthermore, studying subsets of the Florida data (e.g., by size of facility) to assess potential contributors to bias would be uninformative because of the wide confidence interval widths even when including all data. Thus, we recommend that the focus of additional research related to potential gender bias affecting operating room management should be the referral of operative versus non-operative patients to surgeons differing by gender, as reported from Canada [46]. In other words, based on our findings, we suggest future study to understand better why female surgeons are performing fewer cases per quarter, rather than focusing on how the cases are being scheduled.
The National Provider Identifier database currently only allows providers to self-report a gender of "female" or "male" [30]. Thus, potential influence of other gender identities could not be assessed. However, the prevalence of non-binary surgeons [51] likely is too small to affect our conclusions.
Finally, our paper was limited to data related to individual surgeon productivity and related total growth within operating room budgets [8,23,25,26,33]. Thus, our paper and results should not be interpreted as having anything to do with operating room nursing or anesthetist efficiency, productivity, utilization, scheduling, or assignment [35][36][37][38][39]. The US administrative state level databases lack case duration data, as needed to analyze such endpoints. However, the latter was not a limitation for our tested hypotheses, because with most surgeons performing one or two cases per day, their individual percentage utilizations could not be measured accurately anyway [23,28].

Conclusions
Surgeons' frequencies of performing one case on operative days were highly dependent on their quarterly caseloads. The differences between male and female surgeons in their lists were not due to systematic bias in operating room scheduling. Thus, health policy planners looking to reduce gender bias related to surgeons' caseloads should not be focusing on operating room scheduling and managers as potential sources, but rather to external factors such as genderbias in referrals of cases to surgeons. We found that surgeons performing only one case on their operative (anesthesia) workdays more often were female than male surgeons. This is a previously unrecognized reason why it is important for operating room managers to ease the burden for low caseload surgeons. Improving those surgeons' access will support better access of female surgeons.
Supporting information S1 File. Full statistical output from Stata v17.0, including commands and comments, is provided in the supplemental content. These are listed in the same sequence of the Methods and Results for readers interested in more detail or who want to replicate our work with other state or provincial datasets. (PDF)